68 research outputs found

    String Indexing with Compressed Patterns

    Get PDF
    Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern

    Differentially Private Approximate Pattern Matching

    Full text link
    In this paper, we consider the kk-approximate pattern matching problem under differential privacy, where the goal is to report or count all substrings of a given string SS which have a Hamming distance at most kk to a pattern PP, or decide whether such a substring exists. In our definition of privacy, individual positions of the string SS are protected. To be able to answer queries under differential privacy, we allow some slack on kk, i.e. we allow reporting or counting substrings of SS with a distance at most (1+Îł)k+α(1+\gamma)k+\alpha to PP, for a multiplicative error Îł\gamma and an additive error α\alpha. We analyze which values of α\alpha and Îł\gamma are necessary or sufficient to solve the kk-approximate pattern matching problem while satisfying Ï”\epsilon-differential privacy. Let nn denote the length of SS. We give 1) an Ï”\epsilon-differentially private algorithm with an additive error of O(ϔ−1log⁥n)O(\epsilon^{-1}\log n) and no multiplicative error for the existence variant; 2) an Ï”\epsilon-differentially private algorithm with an additive error O(ϔ−1max⁥(k,log⁥n)⋅log⁥n)O(\epsilon^{-1}\max(k,\log n)\cdot\log n) for the counting variant; 3) an Ï”\epsilon-differentially private algorithm with an additive error of O(ϔ−1log⁥n)O(\epsilon^{-1}\log n) and multiplicative error O(1)O(1) for the reporting variant for a special class of patterns. The error bounds hold with high probability. All of these algorithms return a witness, that is, if there exists a substring of SS with distance at most kk to PP, then the algorithm returns a substring of SS with distance at most (1+Îł)k+α(1+\gamma)k+\alpha to PP. Further, we complement these results by a lower bound, showing that any algorithm for the existence variant which also returns a witness must have an additive error of Ω(ϔ−1log⁥n)\Omega(\epsilon^{-1}\log n) with constant probability.Comment: This is a full version of a paper accepted to ITCS 202

    Gapped Indexing for Consecutive Occurrences

    Get PDF
    The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P? and P? and a gap range [?, ?] we can quickly find the consecutive occurrences of P? and P? with distance in [?, ?], i.e., pairs of subsequent occurrences with distance within the range. We present data structures that use O?(n) space and query time O?(|P?|+|P?|+n^{2/3}) for existence and counting and O?(|P?|+|P?|+n^{2/3}occ^{1/3}) for reporting. We complement this with a conditional lower bound based on the set intersection problem showing that any solution using O?(n) space must use ??(|P?| + |P?| + ?n) query time. To obtain our results we develop new techniques and ideas of independent interest including a new suffix tree decomposition and hardness of a variant of the set intersection problem

    String Indexing for Top-k Close Consecutive Occurrences

    Get PDF
    The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string P, report all occurrences of P within S. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-k close consecutive occurrences problem (Sitcco). Here, a consecutive occurrence is a pair (i,j), i < j, such that P occurs at positions i and j in S and there is no occurrence of P between i and j, and their distance is defined as j-i. Given a pattern P and a parameter k, the goal is to report the top-k consecutive occurrences of P in S of minimal distance. The challenge is to compactly represent S while supporting queries in time close to the length of P and k. We give two time-space trade-offs for the problem. Let n be the length of S, m the length of P, and ? ? (0,1]. Our first result achieves O(nlog n) space and optimal query time of O(m+k), and our second result achieves linear space and query time O(m+k^{1+?}). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees

    Compressed Indexing for Consecutive Occurrences

    Get PDF
    The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact occurrence of a given pattern P. However, practical applications motivate the necessity of considering more complex queries, for example concerning near occurrences of two patterns. Recently, Bille et al. [CPM 2021] introduced a variant of such queries, called gapped consecutive occurrences, in which a query consists of two patterns P? and P? and a range [a,b], and one must find all consecutive occurrences (q?,q?) of P? and P? such that q?-q? ? [a,b]. By their results, we cannot hope for a very efficient indexing structure for such queries, even if a = 0 is fixed (although at the same time they provided a non-trivial upper bound). Motivated by this, we focus on a text given as a straight-line program (SLP) and design an index taking space polynomial in the size of the grammar that answers such queries in time optimal up to polylog factors

    String Indexing for Top-kk Close Consecutive Occurrences

    Full text link
    The classic string indexing problem is to preprocess a string SS into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string PP, report all occurrences of PP within SS. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-kk close consecutive occurrences problem (SITCCO). Here, a consecutive occurrence is a pair (i,j)(i,j), i<ji < j, such that PP occurs at positions ii and jj in SS and there is no occurrence of PP between ii and jj, and their distance is defined as j−ij-i. Given a pattern PP and a parameter kk, the goal is to report the top-kk consecutive occurrences of PP in SS of minimal distance. The challenge is to compactly represent SS while supporting queries in time close to length of PP and kk. We give two time-space trade-offs for the problem. Let nn be the length of SS, mm the length of PP, and ϔ∈(0,1]\epsilon\in(0,1]. Our first result achieves O(nlog⁥n)O(n\log n) space and optimal query time of O(m+k)O(m+k), and our second result achieves linear space and query time O(m+k1+Ï”)O(m+k^{1+\epsilon}). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.Comment: Fixed typos, minor change

    Reciprocity between narrative, questioning and imagination in the early and primary years: examining the role of narrative in possibility thinking

    Get PDF
    The concept of Possibility Thinking (PT) as a driving force of creativity has been investigated both conceptually and empirically for over a decade in early years settings and primary classrooms in England. In the first wave of qualitative empirical studies, play formed part of the enabling context. Criteria for episode selection for PT analysis were that episodes exhibited children immersed in sustained focused playful activity. During the second wave of PT studies, the research team’s attention was drawn to children’s imaginative storying in such playful contexts and it emerged that consideration of narrative in PT might prove fruitful. The current paper revisits key published work, and drawing on data previously analysed for features of PT, seeks to explore how narrative might relate to the current theorised framework. Fourteen published PT episodes are re-analysed in order to consider the role and construction of narrative in PT. The new analysis reveals that narrative plays a foundational role in PT, and that reciprocal relationships exist between questioning, imagination and narrative, layered between children and adults. Consequences for nurturing children’s creativity and for future PT research are explored

    Effect of natalizumab on disease progression in secondary progressive multiple sclerosis (ASCEND). a phase 3, randomised, double-blind, placebo-controlled trial with an open-label extension

    Get PDF
    Background: Although several disease-modifying treatments are available for relapsing multiple sclerosis, treatment effects have been more modest in progressive multiple sclerosis and have been observed particularly in actively relapsing subgroups or those with lesion activity on imaging. We sought to assess whether natalizumab slows disease progression in secondary progressive multiple sclerosis, independent of relapses. Methods: ASCEND was a phase 3, randomised, double-blind, placebo-controlled trial (part 1) with an optional 2 year open-label extension (part 2). Enrolled patients aged 18–58 years were natalizumab-naive and had secondary progressive multiple sclerosis for 2 years or more, disability progression unrelated to relapses in the previous year, and Expanded Disability Status Scale (EDSS) scores of 3·0–6·5. In part 1, patients from 163 sites in 17 countries were randomly assigned (1:1) to receive 300 mg intravenous natalizumab or placebo every 4 weeks for 2 years. Patients were stratified by site and by EDSS score (3·0–5·5 vs 6·0–6·5). Patients completing part 1 could enrol in part 2, in which all patients received natalizumab every 4 weeks until the end of the study. Throughout both parts, patients and staff were masked to the treatment received in part 1. The primary outcome in part 1 was the proportion of patients with sustained disability progression, assessed by one or more of three measures: the EDSS, Timed 25-Foot Walk (T25FW), and 9-Hole Peg Test (9HPT). The primary outcome in part 2 was the incidence of adverse events and serious adverse events. Efficacy and safety analyses were done in the intention-to-treat population. This trial is registered with ClinicalTrials.gov, number NCT01416181. Findings: Between Sept 13, 2011, and July 16, 2015, 889 patients were randomly assigned (n=440 to the natalizumab group, n=449 to the placebo group). In part 1, 195 (44%) of 439 natalizumab-treated patients and 214 (48%) of 448 placebo-treated patients had confirmed disability progression (odds ratio [OR] 0·86; 95% CI 0·66–1·13; p=0·287). No treatment effect was observed on the EDSS (OR 1·06, 95% CI 0·74–1·53; nominal p=0·753) or the T25FW (0·98, 0·74–1·30; nominal p=0·914) components of the primary outcome. However, natalizumab treatment reduced 9HPT progression (OR 0·56, 95% CI 0·40–0·80; nominal p=0·001). In part 1, 100 (22%) placebo-treated and 90 (20%) natalizumab-treated patients had serious adverse events. In part 2, 291 natalizumab-continuing patients and 274 natalizumab-naive patients received natalizumab (median follow-up 160 weeks [range 108–221]). Serious adverse events occurred in 39 (13%) patients continuing natalizumab and in 24 (9%) patients initiating natalizumab. Two deaths occurred in part 1, neither of which was considered related to study treatment. No progressive multifocal leukoencephalopathy occurred. Interpretation: Natalizumab treatment for secondary progressive multiple sclerosis did not reduce progression on the primary multicomponent disability endpoint in part 1, but it did reduce progression on its upper-limb component. Longer-term trials are needed to assess whether treatment of secondary progressive multiple sclerosis might produce benefits on additional disability components. Funding: Biogen

    Definition, aims, and implementation of GA2LEN/HAEi Angioedema Centers of Reference and Excellence

    Get PDF

    Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD

    Get PDF
    Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p 10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10−392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group
    • 

    corecore